New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare/Contrast to cloc #175

Closed
djui opened this Issue Jan 12, 2018 · 5 comments

Comments

2 participants
@djui
Copy link

djui commented Jan 12, 2018

Would be nice to the a should comparison to cloc in the readme, mostly for compatibility reasons.

@Aaronepower

This comment has been minimized.

Copy link
Owner

Aaronepower commented Feb 11, 2018

Thank you for this issue! I'm not really a cloc power user so I'm not even sure which features overlap. Speed wise I recently compared counting the Firefox 58 source using hyperfine and tokei . took on average 6.5 seconds whereas cloc --skip-uniqueness . took 2.5 minutes.

@djui

This comment has been minimized.

Copy link

djui commented Feb 12, 2018

For the comparison I’m less interested in performance but more in algorithm: how are lines detected and counted.

I ran cloc and tokei on a couple of repositories and came up with very different numbers across languages. Getting different numbers can create confusion which in turn can lead to disbelief that the performance is sacrificing correctness.

To be fair, since (I assume) neither command ever defined a spec for what/how to count, either could be wrong (or have a different focus/defaults). Maybe it’s possible to take a relatively medium sized, static project and run numbers on it and define this then as a reference for correct/desired line number benchmark.

@Aaronepower

This comment has been minimized.

Copy link
Owner

Aaronepower commented Feb 12, 2018

@djui Okay, I can speak as to why they will probably always be different results.

  • tokei doesn't do any uniqueness check of a file. In my opinion any duplication is an intentional choice on the user's part.

  • tokei respects your .gitignore and won't count anything listed in that(including directories and their children).cloc can achieve similar results with cloc --vcs=git if you're working in a clean directory, but if you are currently working in a project and haven't added those files to git cloc won't count them with --vcs.

  • While it might seem that tokei must be somewhere sacrifing correctness over performance, tokei is actually more accurate and covers cases cloc does not. Looking at cloc's limitations section we can see that cloc will count comments in string literals as comments. tokei will count those as code.

  • cloc for counting uses regular expressions, whereas tokei is just a small state machine that moves linearly through the file. You can actually see how tokei counts a file with tokei -vvv ./file I recommend only using it on small files as it is a lot of information.

@djui

This comment has been minimized.

Copy link

djui commented Feb 12, 2018

I think these are great points that would fit into a/the comparison section. Having different numbers and then checking with the compoarsiin section will explain a user of the difference is to be expected or not.

@Aaronepower

This comment has been minimized.

Copy link
Owner

Aaronepower commented Oct 21, 2018

A COMPARISON.md is now available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment