-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python docstrings are counted as code #62
Comments
So the reason
Knowing the difference between a docstring or a string in the file I don't think is possible without parsing the code using an AST which is going to be incredibly slow compared to how all of the tools currently work. I am not sure how python treats them in the interpreter but I belive it actually processes them every run which would mean that according to Python they are lines of code as well. I belive this was raised enough with tokei which produced the If you can think of a way to reliably identify when it is a doc string vs an actual string, perhaps with some test cases I would be happy to implement this though. Some thoughts I have to help with this.
Not sure if that is exclusive enough to catch all cases though. |
The method you describe is indeed the best heuristic I can think of. If you want to be be even more strict, you can also add a contition that the previous line does not end with message = \
"""
hello
world
""" Here is how I would rank the various strategies, from the worst to the best:
Of course it's possible to use a real parser, but it needs to work with various versions of Python, and be resilient to malformed code. It would probably need one of these heuristics as a fallback, anyway. And even then, I doubt a real parser would be a significant improvement over strategy number 4. For my own usage (on my own Python code base), a tool that uses strategy 1 is perfectly useless, while strategy 2 is perfectly fine. |
I am not a huge fan of option 2 personally. I would rather implement this properly. I just need to figure out how to change the JSON language structure to support this and I will start looking at implementing. |
Attempting to implement on this branch https://github.com/boyter/scc/tree/issue62 |
Since this is related to #71 I am implementing a more generic solution there and discarding the work on the branch. I will remove it eventually once I have things working. |
The JSON changes needed to support this are sitting here #76 Once all 3 pending PR's are merged in I will resume work on this issue. |
Add test for issue62 (boyter#62)
Add test for issue62 (boyter#62)
Add test for issue62 (boyter#62)
Merged in, should be all good now. |
I tried scc 2.2.0 on Windows, on a python file, and it appears that Python docstrings are counted as code instead of comments.
lines: 12
code: 10
comments: 0
blanks: 2
This should be:
lines: 12
code: 3
comments: 7
blanks:2
The text was updated successfully, but these errors were encountered: