New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 57 #58
Conversation
Oh Go 1.12 is out as well. Will be able to see if we get some speed boost out of it. |
A little more speed out of 1.12 which is nice. |
Just stylistic comments so far. I'll take a closer look when I've got time. One thought though, what are the odds of the same extension meaning different things in the same project? If we guess |
That's cool. I have refactored slightly. I did mention that possibility that on the linked issue actually, #57 Personally I doubt there will be any mixed case. I suspect adding some threshold IE if 20 files are all of one type then lets assume all will be that type might be a good solution. I also think perhaps just checking the first 1000 bytes of the file might be good enough, but I need to try with more real world tests. Sadly all the languages this is built to deal with are not mainstream which makes this harder. I think it might be acceptable to push this first with the goal to speed it up afterwards (this is up for debate) as I would want |
A benchmark against the coq repository with ~1600 coq files shows no noticeable slowdown. This should represent one of the worst cases as it should be hitting the language determination code fairly heavily.
Still faster than everything else in this case. I will have a look at the profile for this but at least from a far higher level the price paid for this is not very high. |
No difference being reported between range vs for loop with index. I suspect that perhaps the new GC changes in Go 1.12 might be helping with this. As expected flame graph shows all the time is spent in strings.Index |
Slight tweak. Reduced the number of characters checked. No longer will it scan the whole file, but only the first 2000 characters. On this repository https://github.com/coq/coq it reduced the % cost by about 50% over checking everything else. I did try 1000 characters which reduced it even further without any accuracy cost but I am not sure if thats a good amount to use as someone may have a heavily commented file header. |
Debating if adding a cache is worth it. The bookkeeping involved might be higher overhead then is worth it especially with it selling out the worst case of a mixed repository which just happens to contain Coq Verilog and V in it. I will need to find a larger repository than the one I have been testing to see if it is worth the cost. I would argue at this point given that |
processor/processor.go
Outdated
ExtensionToLanguage[ext] = name | ||
_, ok := ExtensionToLanguage[ext] | ||
|
||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition looks superfluous. append
ing to a nil
slice will create a new slice so the
ExtensionToLanguage[ext] = append(ExtensionToLanguage[ext], name)
line should be all we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah neat. Never thought about that but it makes perfect sense.
That's the last concern from me. Looks good! |
Cool. Ill make that final change and merge in. |
Solves #57
Allows different languages to share the same extension by looking for case sensitive keywords. This will slow down look-ups in these cases but it should not be such a huge issue in practice. The output showed it taking < 1 millisecond per file, and something like 60,000 nanoseconds when I tested. Something to improve in the future though.
Should in theory be backwards compatible with anyone calling into the process using CountStats as well so no issues there.
Also added in V language https://vlang.io/ support while doing this (based on Go).
Once merged going to make this into a new release since its a pretty nice piece of functionality and something that https://github.com/vmchale/polyglot has which this should.