Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 57 #58

Merged
merged 16 commits into from Feb 27, 2019
Merged

Issue 57 #58

merged 16 commits into from Feb 27, 2019

Conversation

boyter
Copy link
Owner

@boyter boyter commented Feb 26, 2019

Solves #57

Allows different languages to share the same extension by looking for case sensitive keywords. This will slow down look-ups in these cases but it should not be such a huge issue in practice. The output showed it taking < 1 millisecond per file, and something like 60,000 nanoseconds when I tested. Something to improve in the future though.

Should in theory be backwards compatible with anyone calling into the process using CountStats as well so no issues there.

Also added in V language https://vlang.io/ support while doing this (based on Go).

Once merged going to make this into a new release since its a pretty nice piece of functionality and something that https://github.com/vmchale/polyglot has which this should.

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

Oh Go 1.12 is out as well. Will be able to see if we get some speed boost out of it.

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

$ hyperfine './scc ~/redis-5.0.2/' && hyperfine 'scc ~/redis-5.0.2/'
Benchmark #1: ./scc ~/redis-5.0.2/
  Time (mean ± σ):      68.9 ms ±   1.9 ms    [User: 132.3 ms, System: 268.0 ms]
  Range (min … max):    66.4 ms …  75.6 ms

Benchmark #1: scc ~/redis-5.0.2/
  Time (mean ± σ):      72.2 ms ±   6.5 ms    [User: 117.5 ms, System: 282.7 ms]
  Range (min … max):    66.7 ms …  97.2 ms

A little more speed out of 1.12 which is nice.

processor/structs.go Outdated Show resolved Hide resolved
processor/file.go Outdated Show resolved Hide resolved
processor/workers.go Outdated Show resolved Hide resolved
@dbaggerman
Copy link
Collaborator

Just stylistic comments so far. I'll take a closer look when I've got time.

One thought though, what are the odds of the same extension meaning different things in the same project? If we guess .v to mean verilog in one file, could we cache that and assume that other .v files are also verilog? Would that even be a worthwhile performance boost?

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

That's cool. I have refactored slightly.

I did mention that possibility that on the linked issue actually, #57

Personally I doubt there will be any mixed case. I suspect adding some threshold IE if 20 files are all of one type then lets assume all will be that type might be a good solution. I also think perhaps just checking the first 1000 bytes of the file might be good enough, but I need to try with more real world tests. Sadly all the languages this is built to deal with are not mainstream which makes this harder.

I think it might be acceptable to push this first with the goal to speed it up afterwards (this is up for debate) as I would want scc to support V when it released next month.

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

A benchmark against the coq repository with ~1600 coq files shows no noticeable slowdown. This should represent one of the worst cases as it should be hitting the language determination code fairly heavily.

$ hyperfine 'scc -c' && hyperfine 'tokei' && hyperfine 'loc' && hyperfine '~/polyglot'
Benchmark #1: scc -c
  Time (mean ± σ):     225.2 ms ±   5.2 ms    [User: 330.0 ms, System: 1203.5 ms]
  Range (min … max):   212.3 ms … 231.8 ms

Benchmark #1: tokei
  Time (mean ± σ):     304.7 ms ±   2.6 ms    [User: 442.0 ms, System: 1147.2 ms]
  Range (min … max):   301.5 ms … 309.6 ms

Benchmark #1: loc
  Time (mean ± σ):     255.1 ms ±  28.6 ms    [User: 529.2 ms, System: 1111.0 ms]
  Range (min … max):   213.6 ms … 300.2 ms

Benchmark #1: ~/polyglot
  Time (mean ± σ):      1.063 s ±  0.037 s    [User: 988.6 ms, System: 1394.1 ms]
  Range (min … max):    1.009 s …  1.136 s

Still faster than everything else in this case. I will have a look at the profile for this but at least from a far higher level the price paid for this is not very high.

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

Quick profile, it almost has no real impact. I might tweak it to use indexed loops over range just to see if that cuts some allocations.

image

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

Running against a repository without the need to trigger the logic at all. No impact which is nice.

image

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

No difference being reported between range vs for loop with index. I suspect that perhaps the new GC changes in Go 1.12 might be helping with this. As expected flame graph shows all the time is spent in strings.Index

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

Slight tweak. Reduced the number of characters checked. No longer will it scan the whole file, but only the first 2000 characters. On this repository https://github.com/coq/coq it reduced the % cost by about 50% over checking everything else. I did try 1000 characters which reduced it even further without any accuracy cost but I am not sure if thats a good amount to use as someone may have a heavily commented file header.

@boyter
Copy link
Owner Author

boyter commented Feb 26, 2019

Debating if adding a cache is worth it. The bookkeeping involved might be higher overhead then is worth it especially with it selling out the worst case of a mixed repository which just happens to contain Coq Verilog and V in it. I will need to find a larger repository than the one I have been testing to see if it is worth the cost.

I would argue at this point given that scc is still outperforming all the other tools while being accurate it is not.

ExtensionToLanguage[ext] = name
_, ok := ExtensionToLanguage[ext]

if ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition looks superfluous. appending to a nil slice will create a new slice so the

ExtensionToLanguage[ext] = append(ExtensionToLanguage[ext], name)

line should be all we need.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah neat. Never thought about that but it makes perfect sense.

@dbaggerman
Copy link
Collaborator

That's the last concern from me. Looks good!

@boyter
Copy link
Owner Author

boyter commented Feb 27, 2019

Cool. Ill make that final change and merge in.

@boyter boyter merged commit 32539d3 into master Feb 27, 2019
@boyter boyter deleted the issue57 branch February 27, 2019 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants