-
Notifications
You must be signed in to change notification settings - Fork 17.4k
Prioritize file type matching over first line matching in grammar scoring system #9513
Conversation
@Ingramz Thanks. I buy your argument that the existing override structure is enough. Let me review this... |
@Ingramz If you look at Travis, It looks like some tests related to the change code are failing in core specs. It looks like we're asserting exactly the opposite of the new behavior, so you'll need to update that. Also, you should add a test that asserts that the path-based selection wins over the first line regexp. |
bestMatch | ||
|
||
# Extended: Returns a {Number} representing how well the grammar matches the | ||
# `filePath` and `contents`. | ||
getGrammarScore: (grammar, filePath, contents) -> | ||
return Infinity if @grammarOverrideForPath(filePath) is grammar.scopeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good.
You had commented with a long description of this logic elsewhere. Could you transfer that into the body of this PR? |
From the other post. This will return Infinity as a score for the overridden grammar, which in my opinion is a little more elegant than 2 + length of file path. The path scoring system is already good as-is. Longer match will get a higher score, plus 0.5 bias for user defined file types. I find it very clever that the user can select between two grammars by adding the file type in question to the user defined list, even if it already was defined in grammar so one of the two grammars always gets priority with the same matched longest file type. Also there is a bias of 0.25 so that a non-bundle package would win over a bundled package, if their matches were the same length (ergo the file types matched are the same), this is the same as the code that was on lines 39/40. However one thing to note - if we get -1 as a file type score for all grammars and there is a non-bundle grammar, then that would win over all bundled grammars with a score of -0.75, it seems that this oddity exists also in original code. I think that this bias shouldn't be applied if the path score itself is -1. <- update: this has been fixed in PR Finally a bias for the case where the first line of the file matches a file, which is currently the weakest - 0.125. This should help in case the two grammars were equal in the previous cases. Setting this with the smallest bias makes more sense to me as it should be used only as a last resort, which works really well for files which the type is unknown. |
Cool, thanks for the explanation. @maxbrunsfeld do you think you could take a look at this too? Maybe we could screen share and just try to think through the implications of changing this and be sure I'm not missing anything. |
The numbers come from the sequence 1/pow(2, n), for which the sum for n=1...infinity is 1, so if we keep going smaller, we hopefully never reach 1 (so let's make sure we're not summing all of the possible terms in that sequence). This way the small increments never override the file type's priority. This also applies recursively, sum for n=2..infinity is 1/2 etc, so all increments smaller than 0.5 will also stay smaller than 0.5, which helps with the priority. |
Cool deal about the numbers. Very nice thought process. Now just I need to see tests of new functionality and a green build. |
Fix activatePackages
@nathansobo seems like the tests are passing now. |
Okay. I'm a bit nervous that I'm not thinking through some implication but this is why we have a beta phase. Merging. Thanks @Ingramz for the thoughtful implementation here. Awesome work. |
Prioritize file type matching over first line matching in grammar scoring system
Fixes #2734
This is a pull request version of #2734 (comment) in order to work together on solving the currently unfair grammar scoring system. The main problem with the current system is that it gives a higher score to HTML grammar for any file that begins with HTML doctype, but the file might be using PHP or some other (templating) language that utilizes HTML syntax (with a different file type, eg
php
).@nathansobo suggested adding an option to fall back to using the old system in case the new one breaks something, but I am not sure how to do that exactly. Also another question would be whether this transition period is needed at all, because all it "breaks" right now is grammar selection for files, for which there is a known file type, but the first line is matched from some other grammar. A fix for that case would be just adding the file type via user configuration, to override the known type from the other language.